Dataset statistics
| Number of variables | 15 |
|---|---|
| Number of observations | 1005348 |
| Missing cells | 1118932 |
| Missing cells (%) | 7.4% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 373.4 MiB |
| Average record size in memory | 389.4 B |
Variable types
| Categorical | 9 |
|---|---|
| Numeric | 6 |
YearofConstruction is highly correlated with BerRating and 1 other fields | High correlation |
BerRating is highly correlated with YearofConstruction and 1 other fields | High correlation |
GroundFloorArea(sq m) is highly correlated with TotalDeliveredEnergy | High correlation |
CO2Rating is highly correlated with YearofConstruction and 2 other fields | High correlation |
TotalDeliveredEnergy is highly correlated with GroundFloorArea(sq m) and 1 other fields | High correlation |
YearofConstruction is highly correlated with BerRating | High correlation |
BerRating is highly correlated with YearofConstruction and 2 other fields | High correlation |
CO2Rating is highly correlated with BerRating and 1 other fields | High correlation |
TotalDeliveredEnergy is highly correlated with BerRating and 1 other fields | High correlation |
YearofConstruction is highly correlated with BerRating | High correlation |
BerRating is highly correlated with YearofConstruction and 1 other fields | High correlation |
CO2Rating is highly correlated with BerRating | High correlation |
MainSpaceHeatingFuel is highly correlated with MainWaterHeatingFuel | High correlation |
MainWaterHeatingFuel is highly correlated with MainSpaceHeatingFuel | High correlation |
YearofConstruction is highly correlated with EnergyRating | High correlation |
EnergyRating is highly correlated with YearofConstruction and 2 other fields | High correlation |
BerRating is highly correlated with CO2Rating and 1 other fields | High correlation |
CO2Rating is highly correlated with BerRating and 1 other fields | High correlation |
MainSpaceHeatingFuel is highly correlated with MainWaterHeatingFuel | High correlation |
MainWaterHeatingFuel is highly correlated with MainSpaceHeatingFuel | High correlation |
VentilationMethod is highly correlated with EnergyRating | High correlation |
InsulationType is highly correlated with EnergyRating | High correlation |
TotalDeliveredEnergy is highly correlated with BerRating and 1 other fields | High correlation |
MainSpaceHeatingFuel has 14110 (1.4%) missing values | Missing |
MainWaterHeatingFuel has 14110 (1.4%) missing values | Missing |
StructureType has 72276 (7.2%) missing values | Missing |
InsulationType has 221093 (22.0%) missing values | Missing |
InsulationThickness has 221093 (22.0%) missing values | Missing |
TotalDeliveredEnergy has 570448 (56.7%) missing values | Missing |
BerRating is highly skewed (γ1 = 52.94408732) | Skewed |
CO2Rating is highly skewed (γ1 = 69.10763716) | Skewed |
TotalDeliveredEnergy is highly skewed (γ1 = 87.3140669) | Skewed |
InsulationThickness has 122900 (12.2%) zeros | Zeros |
Reproduction
| Analysis started | 2022-07-21 20:52:21.787527 |
|---|---|
| Analysis finished | 2022-07-21 20:53:03.498411 |
| Duration | 41.71 seconds |
| Software version | pandas-profiling v3.2.0 |
| Download configuration | config.json |
CountyName
Categorical
| Distinct | 26 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 984.6 KiB |
| Dublin | |
|---|---|
| Cork | |
| Galway | |
| Kildare | 44498 |
| Limerick | 43389 |
| Other values (21) |
Length
| Max length | 9 |
|---|---|
| Median length | 8 |
| Mean length | 6.118182958 |
| Min length | 4 |
Characters and Unicode
| Total characters | 6150903 |
|---|---|
| Distinct characters | 34 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Donegal |
|---|---|
| 2nd row | Kildare |
| 3rd row | Dublin |
| 4th row | Dublin |
| 5th row | Dublin |
Common Values
| Value | Count | Frequency (%) |
| Dublin | 298379 | |
| Cork | 113668 | 11.3% |
| Galway | 54664 | 5.4% |
| Kildare | 44498 | 4.4% |
| Limerick | 43389 | 4.3% |
| Meath | 38687 | 3.8% |
| Wexford | 33527 | 3.3% |
| Tipperary | 32036 | 3.2% |
| Kerry | 31871 | 3.2% |
| Donegal | 31259 | 3.1% |
| Other values (16) | 283370 |
Length
| Value | Count | Frequency (%) |
| dublin | 298379 | |
| cork | 113668 | 11.3% |
| galway | 54664 | 5.4% |
| kildare | 44498 | 4.4% |
| limerick | 43389 | 4.3% |
| meath | 38687 | 3.8% |
| wexford | 33527 | 3.3% |
| tipperary | 32036 | 3.2% |
| kerry | 31871 | 3.2% |
| donegal | 31259 | 3.1% |
| Other values (16) | 283370 |
Most occurring characters
| Value | Count | Frequency (%) |
| i | 554431 | 9.0% |
| l | 542786 | 8.8% |
| r | 471170 | 7.7% |
| a | 444420 | 7.2% |
| n | 417852 | 6.8% |
| o | 398954 | 6.5% |
| e | 367646 | 6.0% |
| D | 329638 | 5.4% |
| u | 327232 | 5.3% |
| b | 298379 | 4.9% |
| Other values (24) | 1998395 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 5145555 | |
| Uppercase Letter | 1005348 | 16.3% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| i | 554431 | |
| l | 542786 | |
| r | 471170 | |
| a | 444420 | 8.6% |
| n | 417852 | 8.1% |
| o | 398954 | 7.8% |
| e | 367646 | 7.1% |
| u | 327232 | 6.4% |
| b | 298379 | 5.8% |
| k | 203681 | 4.0% |
| Other values (13) | 1119004 |
Uppercase Letter
| Value | Count | Frequency (%) |
| D | 329638 | |
| C | 167180 | |
| W | 108411 | 10.8% |
| L | 103970 | 10.3% |
| K | 92868 | 9.2% |
| M | 75617 | 7.5% |
| G | 54664 | 5.4% |
| T | 32036 | 3.2% |
| S | 15288 | 1.5% |
| O | 13452 | 1.3% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 6150903 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| i | 554431 | 9.0% |
| l | 542786 | 8.8% |
| r | 471170 | 7.7% |
| a | 444420 | 7.2% |
| n | 417852 | 6.8% |
| o | 398954 | 6.5% |
| e | 367646 | 6.0% |
| D | 329638 | 5.4% |
| u | 327232 | 5.3% |
| b | 298379 | 4.9% |
| Other values (24) | 1998395 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 6150903 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| i | 554431 | 9.0% |
| l | 542786 | 8.8% |
| r | 471170 | 7.7% |
| a | 444420 | 7.2% |
| n | 417852 | 6.8% |
| o | 398954 | 6.5% |
| e | 367646 | 6.0% |
| D | 329638 | 5.4% |
| u | 327232 | 5.3% |
| b | 298379 | 4.9% |
| Other values (24) | 1998395 |
DwellingTypeDescr
Categorical
| Distinct | 11 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 983.0 KiB |
| Detached house | |
|---|---|
| Semi-detached house | |
| Mid-terrace house | |
| End of terrace house | |
| Mid-floor apartment | |
| Other values (6) |
Length
| Max length | 22 |
|---|---|
| Median length | 20 |
| Mean length | 16.91869084 |
| Min length | 5 |
Characters and Unicode
| Total characters | 17009172 |
|---|---|
| Distinct characters | 29 |
| Distinct categories | 4 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Detached house |
|---|---|
| 2nd row | Detached house |
| 3rd row | Semi-detached house |
| 4th row | Semi-detached house |
| 5th row | Semi-detached house |
Common Values
| Value | Count | Frequency (%) |
| Detached house | 291545 | |
| Semi-detached house | 272815 | |
| Mid-terrace house | 140621 | |
| End of terrace house | 77554 | 7.7% |
| Mid-floor apartment | 65189 | 6.5% |
| Top-floor apartment | 56080 | 5.6% |
| Ground-floor apartment | 54161 | 5.4% |
| House | 33288 | 3.3% |
| Maisonette | 10920 | 1.1% |
| Apartment | 2856 | 0.3% |
Length
| Value | Count | Frequency (%) |
| house | 815823 | |
| detached | 291545 | 13.8% |
| semi-detached | 272815 | 12.9% |
| apartment | 178286 | 8.4% |
| mid-terrace | 140621 | 6.6% |
| end | 77554 | 3.7% |
| of | 77554 | 3.7% |
| terrace | 77554 | 3.7% |
| mid-floor | 65189 | 3.1% |
| top-floor | 56080 | 2.6% |
| Other values (4) | 65719 | 3.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 2854791 | |
| o | 1365398 | 8.0% |
| h | 1346895 | 7.9% |
| d | 1174700 | 6.9% |
| t | 1161266 | 6.8% |
| a | 1147490 | 6.7% |
| 1113392 | 6.5% | |
| u | 869984 | 5.1% |
| r | 844227 | 5.0% |
| s | 827062 | 4.9% |
| Other values (19) | 4303967 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 14301247 | |
| Space Separator | 1113392 | 6.5% |
| Uppercase Letter | 1005667 | 5.9% |
| Dash Punctuation | 588866 | 3.5% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 2854791 | |
| o | 1365398 | |
| h | 1346895 | |
| d | 1174700 | |
| t | 1161266 | |
| a | 1147490 | |
| u | 869984 | 6.1% |
| r | 844227 | 5.9% |
| s | 827062 | 5.8% |
| c | 782535 | 5.5% |
| Other values (8) | 1926899 |
Uppercase Letter
| Value | Count | Frequency (%) |
| D | 291864 | |
| S | 272815 | |
| M | 216730 | |
| E | 77554 | 7.7% |
| T | 56080 | 5.6% |
| G | 54161 | 5.4% |
| H | 33288 | 3.3% |
| A | 2856 | 0.3% |
| B | 319 | < 0.1% |
Space Separator
| Value | Count | Frequency (%) |
| 1113392 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 588866 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 15306914 | |
| Common | 1702258 | 10.0% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 2854791 | |
| o | 1365398 | |
| h | 1346895 | |
| d | 1174700 | |
| t | 1161266 | 7.6% |
| a | 1147490 | 7.5% |
| u | 869984 | 5.7% |
| r | 844227 | 5.5% |
| s | 827062 | 5.4% |
| c | 782535 | 5.1% |
| Other values (17) | 2932566 |
Common
| Value | Count | Frequency (%) |
| 1113392 | ||
| - | 588866 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 17009172 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 2854791 | |
| o | 1365398 | 8.0% |
| h | 1346895 | 7.9% |
| d | 1174700 | 6.9% |
| t | 1161266 | 6.8% |
| a | 1147490 | 6.7% |
| 1113392 | 6.5% | |
| u | 869984 | 5.1% |
| r | 844227 | 5.0% |
| s | 827062 | 4.9% |
| Other values (19) | 4303967 |
YearofConstruction
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATION| Distinct | 261 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1982.779243 |
| Minimum | 1753 |
|---|---|
| Maximum | 2022 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.7 MiB |
Quantile statistics
| Minimum | 1753 |
|---|---|
| 5-th percentile | 1900 |
| Q1 | 1972 |
| median | 1996 |
| Q3 | 2005 |
| 95-th percentile | 2018 |
| Maximum | 2022 |
| Range | 269 |
| Interquartile range (IQR) | 33 |
Descriptive statistics
| Standard deviation | 33.9271398 |
|---|---|
| Coefficient of variation (CV) | 0.01711090124 |
| Kurtosis | 3.820401463 |
| Mean | 1982.779243 |
| Median Absolute Deviation (MAD) | 12 |
| Skewness | -1.793886072 |
| Sum | 1993383146 |
| Variance | 1151.050815 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 2006 | 49227 | 4.9% |
| 2004 | 47123 | 4.7% |
| 2005 | 45820 | 4.6% |
| 2003 | 38902 | 3.9% |
| 2007 | 36824 | 3.7% |
| 2002 | 32444 | 3.2% |
| 1900 | 30916 | 3.1% |
| 2000 | 29095 | 2.9% |
| 2001 | 25248 | 2.5% |
| 1998 | 24029 | 2.4% |
| Other values (251) | 645720 |
| Value | Count | Frequency (%) |
| 1753 | 14 | < 0.1% |
| 1757 | 1 | < 0.1% |
| 1759 | 3 | < 0.1% |
| 1760 | 223 | |
| 1761 | 8 | < 0.1% |
| 1762 | 1 | < 0.1% |
| 1764 | 1 | < 0.1% |
| 1765 | 5 | < 0.1% |
| 1766 | 1 | < 0.1% |
| 1767 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 2022 | 4870 | 0.5% |
| 2021 | 10807 | |
| 2020 | 14814 | |
| 2019 | 18371 | |
| 2018 | 13715 | |
| 2017 | 10543 | |
| 2016 | 7770 | |
| 2015 | 4870 | 0.5% |
| 2014 | 3208 | 0.3% |
| 2013 | 2179 | 0.2% |
| Distinct | 15 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 983.3 KiB |
| C2 | |
|---|---|
| C3 | |
| D1 | |
| C1 | |
| D2 | |
| Other values (10) |
Length
| Max length | 2 |
|---|---|
| Median length | 2 |
| Mean length | 1.887439971 |
| Min length | 1 |
Characters and Unicode
| Total characters | 1897534 |
|---|---|
| Distinct characters | 10 |
| Distinct categories | 2 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | C2 |
|---|---|
| 2nd row | B3 |
| 3rd row | C3 |
| 4th row | C2 |
| 5th row | D2 |
Common Values
| Value | Count | Frequency (%) |
| C2 | 124530 | |
| C3 | 118185 | |
| D1 | 114352 | |
| C1 | 113782 | |
| D2 | 98187 | |
| B3 | 77970 | |
| G | 66815 | |
| E1 | 56631 | |
| A3 | 51194 | 5.1% |
| F | 46347 | 4.6% |
| Other values (5) | 137355 |
Length
| Value | Count | Frequency (%) |
| c2 | 124530 | |
| c3 | 118185 | |
| d1 | 114352 | |
| c1 | 113782 | |
| d2 | 98187 | |
| b3 | 77970 | |
| g | 66815 | |
| e1 | 56631 | |
| a3 | 51194 | 5.1% |
| f | 46347 | 4.6% |
| Other values (5) | 137355 |
Most occurring characters
| Value | Count | Frequency (%) |
| C | 356497 | |
| 2 | 343553 | |
| 1 | 301284 | |
| 3 | 247349 | |
| D | 212539 | |
| B | 126045 | 6.6% |
| E | 101411 | 5.3% |
| A | 95694 | 5.0% |
| G | 66815 | 3.5% |
| F | 46347 | 2.4% |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 1005348 | |
| Decimal Number | 892186 |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| C | 356497 | |
| D | 212539 | |
| B | 126045 | 12.5% |
| E | 101411 | 10.1% |
| A | 95694 | 9.5% |
| G | 66815 | 6.6% |
| F | 46347 | 4.6% |
Decimal Number
| Value | Count | Frequency (%) |
| 2 | 343553 | |
| 1 | 301284 | |
| 3 | 247349 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 1005348 | |
| Common | 892186 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| C | 356497 | |
| D | 212539 | |
| B | 126045 | 12.5% |
| E | 101411 | 10.1% |
| A | 95694 | 9.5% |
| G | 66815 | 6.6% |
| F | 46347 | 4.6% |
Common
| Value | Count | Frequency (%) |
| 2 | 343553 | |
| 1 | 301284 | |
| 3 | 247349 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1897534 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| C | 356497 | |
| 2 | 343553 | |
| 1 | 301284 | |
| 3 | 247349 | |
| D | 212539 | |
| B | 126045 | 6.6% |
| E | 101411 | 5.3% |
| A | 95694 | 5.0% |
| G | 66815 | 3.5% |
| F | 46347 | 2.4% |
| Distinct | 77722 |
|---|---|
| Distinct (%) | 7.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 238.9152415 |
| Minimum | -158.42 |
|---|---|
| Maximum | 56423.71 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 161 |
| Negative (%) | < 0.1% |
| Memory size | 7.7 MiB |
Quantile statistics
| Minimum | -158.42 |
|---|---|
| 5-th percentile | 51.89 |
| Q1 | 158.08 |
| median | 209.93 |
| Q3 | 285.29 |
| 95-th percentile | 497.57 |
| Maximum | 56423.71 |
| Range | 56582.13 |
| Interquartile range (IQR) | 127.21 |
Descriptive statistics
| Standard deviation | 173.5624388 |
|---|---|
| Coefficient of variation (CV) | 0.7264603034 |
| Kurtosis | 13616.76436 |
| Mean | 238.9152415 |
| Median Absolute Deviation (MAD) | 60.88 |
| Skewness | 52.94408732 |
| Sum | 240192960.2 |
| Variance | 30123.92017 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 224.56 | 102 | < 0.1% |
| 224.87 | 93 | < 0.1% |
| 224.85 | 93 | < 0.1% |
| 224.63 | 89 | < 0.1% |
| 224.82 | 87 | < 0.1% |
| 224.95 | 87 | < 0.1% |
| 174.81 | 87 | < 0.1% |
| 224.86 | 86 | < 0.1% |
| 174.92 | 86 | < 0.1% |
| 174.79 | 86 | < 0.1% |
| Other values (77712) | 1004452 |
| Value | Count | Frequency (%) |
| -158.42 | 1 | |
| -97.37 | 1 | |
| -63.96 | 1 | |
| -60.97 | 1 | |
| -56.06 | 1 | |
| -49.16 | 1 | |
| -48.01 | 1 | |
| -45.32 | 1 | |
| -44.66 | 1 | |
| -43.64 | 1 |
| Value | Count | Frequency (%) |
| 56423.71 | 1 | |
| 32134.94 | 1 | |
| 31623.33 | 1 | |
| 21725.62 | 1 | |
| 18771.31 | 1 | |
| 13914.78 | 1 | |
| 11823.78 | 1 | |
| 11476.29 | 1 | |
| 9892.94 | 1 | |
| 9183.17 | 1 |
| Distinct | 43114 |
|---|---|
| Distinct (%) | 4.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 114.4886734 |
| Minimum | 5.47 |
|---|---|
| Maximum | 3546.11 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.7 MiB |
Quantile statistics
| Minimum | 5.47 |
|---|---|
| 5-th percentile | 47.66 |
| Q1 | 77.9 |
| median | 100.3 |
| Q3 | 134.48 |
| 95-th percentile | 231.21 |
| Maximum | 3546.11 |
| Range | 3540.64 |
| Interquartile range (IQR) | 56.58 |
Descriptive statistics
| Standard deviation | 60.4344775 |
|---|---|
| Coefficient of variation (CV) | 0.5278642481 |
| Kurtosis | 35.93359276 |
| Mean | 114.4886734 |
| Median Absolute Deviation (MAD) | 26.34 |
| Skewness | 2.823807777 |
| Sum | 115100958.8 |
| Variance | 3652.326071 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 81 | 1324 | 0.1% |
| 80 | 1139 | 0.1% |
| 84 | 964 | 0.1% |
| 90 | 960 | 0.1% |
| 82 | 952 | 0.1% |
| 78 | 783 | 0.1% |
| 88 | 774 | 0.1% |
| 70 | 747 | 0.1% |
| 85 | 734 | 0.1% |
| 108 | 694 | 0.1% |
| Other values (43104) | 996277 |
| Value | Count | Frequency (%) |
| 5.47 | 1 | |
| 6.7 | 1 | |
| 7.21 | 1 | |
| 7.26 | 1 | |
| 7.47 | 1 | |
| 7.7 | 1 | |
| 7.91 | 1 | |
| 7.96 | 1 | |
| 8.3 | 1 | |
| 8.31 | 1 |
| Value | Count | Frequency (%) |
| 3546.11 | 1 | |
| 3229.39 | 1 | |
| 2331.92 | 1 | |
| 2011.25 | 1 | |
| 1825.99 | 1 | |
| 1788.6 | 1 | |
| 1705.02 | 1 | |
| 1625.34 | 1 | |
| 1593 | 1 | |
| 1572.51 | 1 |
| Distinct | 29089 |
|---|---|
| Distinct (%) | 2.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 55.64251746 |
| Minimum | -88.57 |
|---|---|
| Maximum | 18417.1 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Negative | 185 |
| Negative (%) | < 0.1% |
| Memory size | 7.7 MiB |
Quantile statistics
| Minimum | -88.57 |
|---|---|
| 5-th percentile | 9.91 |
| Q1 | 33.55 |
| median | 46.98 |
| Q3 | 65.91 |
| 95-th percentile | 124.63 |
| Maximum | 18417.1 |
| Range | 18505.67 |
| Interquartile range (IQR) | 32.36 |
Descriptive statistics
| Standard deviation | 48.95306043 |
|---|---|
| Coefficient of variation (CV) | 0.8797779588 |
| Kurtosis | 22151.18262 |
| Mean | 55.64251746 |
| Median Absolute Deviation (MAD) | 15.44 |
| Skewness | 69.10763716 |
| Sum | 55940093.64 |
| Variance | 2396.402125 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 41.39 | 236 | < 0.1% |
| 43.4 | 233 | < 0.1% |
| 44.12 | 232 | < 0.1% |
| 36.44 | 231 | < 0.1% |
| 41.44 | 228 | < 0.1% |
| 38.77 | 227 | < 0.1% |
| 40.98 | 226 | < 0.1% |
| 42.08 | 226 | < 0.1% |
| 37.44 | 226 | < 0.1% |
| 41.45 | 226 | < 0.1% |
| Other values (29079) | 1003057 |
| Value | Count | Frequency (%) |
| -88.57 | 1 | |
| -27.98 | 1 | |
| -23.82 | 1 | |
| -20.25 | 1 | |
| -17.51 | 1 | |
| -16.02 | 1 | |
| -14.49 | 1 | |
| -11.99 | 1 | |
| -11.22 | 1 | |
| -11.02 | 1 |
| Value | Count | Frequency (%) |
| 18417.1 | 1 | |
| 10541 | 1 | |
| 5840.25 | 1 | |
| 4019.2 | 1 | |
| 3467.31 | 1 | |
| 3327.24 | 1 | |
| 3283.77 | 1 | |
| 2822.29 | 1 | |
| 2817.23 | 1 | |
| 2760.17 | 1 |
| Distinct | 20 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 14110 |
| Missing (%) | 1.4% |
| Memory size | 64.5 MiB |
| Mains Gas | |
|---|---|
| Heating Oil | |
| Electricity | |
| Solid Multi-Fuel | 30996 |
| Bulk LPG (propane or butane) | 13573 |
| Other values (15) | 14719 |
Length
| Max length | 30 |
|---|---|
| Median length | 11 |
| Mean length | 10.74701232 |
| Min length | 8 |
Characters and Unicode
| Total characters | 10652847 |
|---|---|
| Distinct characters | 42 |
| Distinct categories | 6 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 1 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | Heating Oil |
|---|---|
| 2nd row | Heating Oil |
| 3rd row | Mains Gas |
| 4th row | Mains Gas |
| 5th row | Mains Gas |
Common Values
| Value | Count | Frequency (%) |
| Mains Gas | 382337 | |
| Heating Oil | 364334 | |
| Electricity | 185279 | |
| Solid Multi-Fuel | 30996 | 3.1% |
| Bulk LPG (propane or butane) | 13573 | 1.4% |
| Manufactured Smokeless Fuel | 6614 | 0.7% |
| House Coal | 3120 | 0.3% |
| Wood Pellets (bulk supply for | 1321 | 0.1% |
| Sod Peat | 1221 | 0.1% |
| Bottled LPG | 1099 | 0.1% |
| Other values (10) | 1344 | 0.1% |
| (Missing) | 14110 | 1.4% |
Length
| Value | Count | Frequency (%) |
| mains | 382337 | |
| gas | 382337 | |
| heating | 364334 | |
| oil | 364334 | |
| electricity | 185369 | |
| solid | 30996 | 1.7% |
| multi-fuel | 30996 | 1.7% |
| bulk | 14894 | 0.8% |
| lpg | 14672 | 0.8% |
| butane | 13573 | 0.7% |
| Other values (34) | 65605 | 3.5% |
Most occurring characters
| Value | Count | Frequency (%) |
| i | 1544425 | |
| a | 1174393 | |
| 858209 | 8.1% | |
| t | 792255 | 7.4% |
| s | 785310 | 7.4% |
| n | 780742 | 7.3% |
| l | 679417 | 6.4% |
| e | 643982 | 6.0% |
| M | 419947 | 3.9% |
| G | 397009 | 3.7% |
| Other values (32) | 2577158 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 7870541 | |
| Uppercase Letter | 1864230 | 17.5% |
| Space Separator | 858209 | 8.1% |
| Dash Punctuation | 31194 | 0.3% |
| Open Punctuation | 15100 | 0.1% |
| Close Punctuation | 13573 | 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| i | 1544425 | |
| a | 1174393 | |
| t | 792255 | |
| s | 785310 | |
| n | 780742 | |
| l | 679417 | |
| e | 643982 | |
| c | 377622 | 4.8% |
| g | 365252 | 4.6% |
| r | 221036 | 2.8% |
| Other values (12) | 506107 | 6.4% |
Uppercase Letter
| Value | Count | Frequency (%) |
| M | 419947 | |
| G | 397009 | |
| H | 367454 | |
| O | 364388 | |
| E | 185369 | |
| S | 38867 | 2.1% |
| F | 37610 | 2.0% |
| P | 17689 | 0.9% |
| L | 15330 | 0.8% |
| B | 14946 | 0.8% |
| Other values (6) | 5621 | 0.3% |
Space Separator
| Value | Count | Frequency (%) |
| 858209 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 31194 |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 15100 |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 13573 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 9734771 | |
| Common | 918076 | 8.6% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| i | 1544425 | |
| a | 1174393 | |
| t | 792255 | 8.1% |
| s | 785310 | 8.1% |
| n | 780742 | 8.0% |
| l | 679417 | 7.0% |
| e | 643982 | 6.6% |
| M | 419947 | 4.3% |
| G | 397009 | 4.1% |
| c | 377622 | 3.9% |
| Other values (28) | 2139669 |
Common
| Value | Count | Frequency (%) |
| 858209 | ||
| - | 31194 | 3.4% |
| ( | 15100 | 1.6% |
| ) | 13573 | 1.5% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 10652847 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| i | 1544425 | |
| a | 1174393 | |
| 858209 | 8.1% | |
| t | 792255 | 7.4% |
| s | 785310 | 7.4% |
| n | 780742 | 7.3% |
| l | 679417 | 6.4% |
| e | 643982 | 6.0% |
| M | 419947 | 3.9% |
| G | 397009 | 3.7% |
| Other values (32) | 2577158 |
| Distinct | 21 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 14110 |
| Missing (%) | 1.4% |
| Memory size | 64.5 MiB |
| Mains Gas | |
|---|---|
| Heating Oil | |
| Electricity | |
| Solid Multi-Fuel | 28848 |
| Bulk LPG (propane or butane) | 13520 |
| Other values (16) | 13616 |
Length
| Max length | 30 |
|---|---|
| Median length | 11 |
| Mean length | 10.72271644 |
| Min length | 4 |
Characters and Unicode
| Total characters | 10628764 |
|---|---|
| Distinct characters | 42 |
| Distinct categories | 6 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 1 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | Heating Oil |
|---|---|
| 2nd row | Heating Oil |
| 3rd row | Mains Gas |
| 4th row | Mains Gas |
| 5th row | Mains Gas |
Common Values
| Value | Count | Frequency (%) |
| Mains Gas | 380320 | |
| Heating Oil | 361268 | |
| Electricity | 193666 | |
| Solid Multi-Fuel | 28848 | 2.9% |
| Bulk LPG (propane or butane) | 13520 | 1.3% |
| Manufactured Smokeless Fuel | 5604 | 0.6% |
| House Coal | 2959 | 0.3% |
| Wood Pellets (bulk supply for | 1271 | 0.1% |
| Sod Peat | 1244 | 0.1% |
| Bottled LPG | 1204 | 0.1% |
| Other values (11) | 1334 | 0.1% |
| (Missing) | 14110 | 1.4% |
Length
| Value | Count | Frequency (%) |
| mains | 380320 | |
| gas | 380320 | |
| heating | 361268 | |
| oil | 361268 | |
| electricity | 193757 | |
| solid | 28848 | 1.6% |
| multi-fuel | 28848 | 1.6% |
| bulk | 14791 | 0.8% |
| lpg | 14724 | 0.8% |
| butane | 13520 | 0.7% |
| Other values (35) | 62126 | 3.4% |
Most occurring characters
| Value | Count | Frequency (%) |
| i | 1548811 | |
| a | 1165086 | |
| 848552 | 8.0% | |
| t | 803001 | 7.6% |
| s | 779006 | 7.3% |
| n | 774603 | 7.3% |
| l | 676008 | 6.4% |
| e | 642964 | 6.0% |
| M | 414772 | 3.9% |
| G | 395044 | 3.7% |
| Other values (32) | 2580917 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 7869913 | |
| Uppercase Letter | 1852748 | 17.4% |
| Space Separator | 848552 | 8.0% |
| Dash Punctuation | 29021 | 0.3% |
| Open Punctuation | 15010 | 0.1% |
| Close Punctuation | 13520 | 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| i | 1548811 | |
| a | 1165086 | |
| t | 803001 | |
| s | 779006 | |
| n | 774603 | |
| l | 676008 | |
| e | 642964 | |
| c | 393424 | 5.0% |
| g | 362123 | 4.6% |
| r | 228326 | 2.9% |
| Other values (12) | 496561 | 6.3% |
Uppercase Letter
| Value | Count | Frequency (%) |
| M | 414772 | |
| G | 395044 | |
| H | 364227 | |
| O | 361309 | |
| E | 193757 | |
| S | 35746 | 1.9% |
| F | 34452 | 1.9% |
| P | 17724 | 1.0% |
| L | 15319 | 0.8% |
| B | 15003 | 0.8% |
| Other values (6) | 5395 | 0.3% |
Space Separator
| Value | Count | Frequency (%) |
| 848552 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 29021 |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 15010 |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 13520 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 9722661 | |
| Common | 906103 | 8.5% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| i | 1548811 | |
| a | 1165086 | |
| t | 803001 | 8.3% |
| s | 779006 | 8.0% |
| n | 774603 | 8.0% |
| l | 676008 | 7.0% |
| e | 642964 | 6.6% |
| M | 414772 | 4.3% |
| G | 395044 | 4.1% |
| c | 393424 | 4.0% |
| Other values (28) | 2129942 |
Common
| Value | Count | Frequency (%) |
| 848552 | ||
| - | 29021 | 3.2% |
| ( | 15010 | 1.7% |
| ) | 13520 | 1.5% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 10628764 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| i | 1548811 | |
| a | 1165086 | |
| 848552 | 8.0% | |
| t | 803001 | 7.6% |
| s | 779006 | 7.3% |
| n | 774603 | 7.3% |
| l | 676008 | 6.4% |
| e | 642964 | 6.0% |
| M | 414772 | 3.9% |
| G | 395044 | 3.7% |
| Other values (32) | 2580917 |
| Distinct | 6 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 2901 |
| Missing (%) | 0.3% |
| Memory size | 67.8 MiB |
| Natural vent. | |
|---|---|
| Bal.whole mech.vent heat recvr | 31490 |
| Whole house extract vent. | 26036 |
| Pos input vent.- loft | 1378 |
| Bal.whole mech.vent no heat re | 429 |
Length
| Max length | 30 |
|---|---|
| Median length | 13 |
| Mean length | 13.86822246 |
| Min length | 13 |
Characters and Unicode
| Total characters | 13902158 |
|---|---|
| Distinct characters | 26 |
| Distinct categories | 5 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Natural vent. |
|---|---|
| 2nd row | Natural vent. |
| 3rd row | Natural vent. |
| 4th row | Natural vent. |
| 5th row | Natural vent. |
Common Values
| Value | Count | Frequency (%) |
| Natural vent. | 942726 | |
| Bal.whole mech.vent heat recvr | 31490 | 3.1% |
| Whole house extract vent. | 26036 | 2.6% |
| Pos input vent.- loft | 1378 | 0.1% |
| Bal.whole mech.vent no heat re | 429 | < 0.1% |
| Pos input vent.- outside | 388 | < 0.1% |
| (Missing) | 2901 | 0.3% |
Length
Category Frequency Plot
| Value | Count | Frequency (%) |
| vent | 970528 | |
| natural | 942726 | |
| bal.whole | 31919 | 1.5% |
| mech.vent | 31919 | 1.5% |
| heat | 31919 | 1.5% |
| recvr | 31490 | 1.5% |
| whole | 26036 | 1.2% |
| house | 26036 | 1.2% |
| extract | 26036 | 1.2% |
| pos | 1766 | 0.1% |
| Other values (5) | 4390 | 0.2% |
Most occurring characters
| Value | Count | Frequency (%) |
| t | 2032696 | |
| a | 1975326 | |
| e | 1208619 | |
| 1122318 | ||
| . | 1034366 | |
| l | 1033978 | |
| v | 1033937 | |
| r | 1032171 | |
| n | 1004642 | |
| u | 970916 | |
| Other values (16) | 1453189 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 10741261 | |
| Space Separator | 1122318 | 8.1% |
| Other Punctuation | 1034366 | 7.4% |
| Uppercase Letter | 1002447 | 7.2% |
| Dash Punctuation | 1766 | < 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| t | 2032696 | |
| a | 1975326 | |
| e | 1208619 | |
| l | 1033978 | |
| v | 1033937 | |
| r | 1032171 | |
| n | 1004642 | |
| u | 970916 | |
| h | 147829 | 1.4% |
| c | 89445 | 0.8% |
| Other values (9) | 211702 | 2.0% |
Uppercase Letter
| Value | Count | Frequency (%) |
| N | 942726 | |
| B | 31919 | 3.2% |
| W | 26036 | 2.6% |
| P | 1766 | 0.2% |
Space Separator
| Value | Count | Frequency (%) |
| 1122318 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 1034366 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 1766 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 11743708 | |
| Common | 2158450 | 15.5% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| t | 2032696 | |
| a | 1975326 | |
| e | 1208619 | |
| l | 1033978 | |
| v | 1033937 | |
| r | 1032171 | |
| n | 1004642 | |
| u | 970916 | |
| N | 942726 | |
| h | 147829 | 1.3% |
| Other values (13) | 360868 | 3.1% |
Common
| Value | Count | Frequency (%) |
| 1122318 | ||
| . | 1034366 | |
| - | 1766 | 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 13902158 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| t | 2032696 | |
| a | 1975326 | |
| e | 1208619 | |
| 1122318 | ||
| . | 1034366 | |
| l | 1033978 | |
| v | 1033937 | |
| r | 1032171 | |
| n | 1004642 | |
| u | 970916 | |
| Other values (16) | 1453189 |
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 72276 |
| Missing (%) | 7.2% |
| Memory size | 60.0 MiB |
| Masonry | |
|---|---|
| Timber or Steel Frame | 59584 |
| Insulated Conctete Form | 6327 |
Length
| Max length | 23 |
|---|---|
| Median length | 7 |
| Mean length | 8.002503558 |
| Min length | 7 |
Characters and Unicode
| Total characters | 7466912 |
|---|---|
| Distinct characters | 22 |
| Distinct categories | 3 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Masonry |
|---|---|
| 2nd row | Masonry |
| 3rd row | Masonry |
| 4th row | Masonry |
| 5th row | Masonry |
Common Values
| Value | Count | Frequency (%) |
| Masonry | 867161 | |
| Timber or Steel Frame | 59584 | 5.9% |
| Insulated Conctete Form | 6327 | 0.6% |
| (Missing) | 72276 | 7.2% |
Length
Category Frequency Plot
| Value | Count | Frequency (%) |
| masonry | 867161 | |
| timber | 59584 | 5.3% |
| or | 59584 | 5.3% |
| steel | 59584 | 5.3% |
| frame | 59584 | 5.3% |
| insulated | 6327 | 0.6% |
| conctete | 6327 | 0.6% |
| form | 6327 | 0.6% |
Most occurring characters
| Value | Count | Frequency (%) |
| r | 1052240 | |
| o | 939399 | |
| a | 933072 | |
| n | 879815 | |
| s | 873488 | |
| M | 867161 | |
| y | 867161 | |
| e | 257317 | 3.4% |
| 191406 | 2.6% | |
| m | 125495 | 1.7% |
| Other values (12) | 480358 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 6210612 | |
| Uppercase Letter | 1064894 | 14.3% |
| Space Separator | 191406 | 2.6% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| r | 1052240 | |
| o | 939399 | |
| a | 933072 | |
| n | 879815 | |
| s | 873488 | |
| y | 867161 | |
| e | 257317 | 4.1% |
| m | 125495 | 2.0% |
| t | 78565 | 1.3% |
| l | 65911 | 1.1% |
| Other values (5) | 138149 | 2.2% |
Uppercase Letter
| Value | Count | Frequency (%) |
| M | 867161 | |
| F | 65911 | 6.2% |
| T | 59584 | 5.6% |
| S | 59584 | 5.6% |
| I | 6327 | 0.6% |
| C | 6327 | 0.6% |
Space Separator
| Value | Count | Frequency (%) |
| 191406 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 7275506 | |
| Common | 191406 | 2.6% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| r | 1052240 | |
| o | 939399 | |
| a | 933072 | |
| n | 879815 | |
| s | 873488 | |
| M | 867161 | |
| y | 867161 | |
| e | 257317 | 3.5% |
| m | 125495 | 1.7% |
| t | 78565 | 1.1% |
| Other values (11) | 401793 | 5.5% |
Common
| Value | Count | Frequency (%) |
| 191406 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 7466912 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| r | 1052240 | |
| o | 939399 | |
| a | 933072 | |
| n | 879815 | |
| s | 873488 | |
| M | 867161 | |
| y | 867161 | |
| e | 257317 | 3.4% |
| 191406 | 2.6% | |
| m | 125495 | 1.7% |
| Other values (12) | 480358 |
NoOfSidesSheltered
Categorical
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 2901 |
| Missing (%) | 0.3% |
| Memory size | 57.5 MiB |
| 2.0 | |
|---|---|
| 3.0 | |
| 4.0 | |
| 1.0 | |
| 0.0 |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Characters and Unicode
| Total characters | 3007341 |
|---|---|
| Distinct characters | 6 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1.0 |
|---|---|
| 2nd row | 2.0 |
| 3rd row | 3.0 |
| 4th row | 2.0 |
| 5th row | 2.0 |
Common Values
| Value | Count | Frequency (%) |
| 2.0 | 407774 | |
| 3.0 | 271434 | |
| 4.0 | 140400 | 14.0% |
| 1.0 | 118539 | 11.8% |
| 0.0 | 64300 | 6.4% |
| (Missing) | 2901 | 0.3% |
Length
Category Frequency Plot
| Value | Count | Frequency (%) |
| 2.0 | 407774 | |
| 3.0 | 271434 | |
| 4.0 | 140400 | 14.0% |
| 1.0 | 118539 | 11.8% |
| 0.0 | 64300 | 6.4% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 1066747 | |
| . | 1002447 | |
| 2 | 407774 | 13.6% |
| 3 | 271434 | 9.0% |
| 4 | 140400 | 4.7% |
| 1 | 118539 | 3.9% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 2004894 | |
| Other Punctuation | 1002447 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 1066747 | |
| 2 | 407774 | 20.3% |
| 3 | 271434 | 13.5% |
| 4 | 140400 | 7.0% |
| 1 | 118539 | 5.9% |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 1002447 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 3007341 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 1066747 | |
| . | 1002447 | |
| 2 | 407774 | 13.6% |
| 3 | 271434 | 9.0% |
| 4 | 140400 | 4.7% |
| 1 | 118539 | 3.9% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 3007341 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 1066747 | |
| . | 1002447 | |
| 2 | 407774 | 13.6% |
| 3 | 271434 | 9.0% |
| 4 | 140400 | 4.7% |
| 1 | 118539 | 3.9% |
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 221093 |
| Missing (%) | 22.0% |
| Memory size | 60.0 MiB |
| Factory Insulated | |
|---|---|
| Loose Jacket | |
| None |
Length
| Max length | 17 |
|---|---|
| Median length | 17 |
| Mean length | 14.21466615 |
| Min length | 4 |
Characters and Unicode
| Total characters | 11147923 |
|---|---|
| Distinct characters | 19 |
| Distinct categories | 3 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Factory Insulated |
|---|---|
| 2nd row | Factory Insulated |
| 3rd row | Loose Jacket |
| 4th row | Loose Jacket |
| 5th row | Factory Insulated |
Common Values
| Value | Count | Frequency (%) |
| Factory Insulated | 494467 | |
| Loose Jacket | 197854 | |
| None | 91934 | 9.1% |
| (Missing) | 221093 |
Length
Category Frequency Plot
| Value | Count | Frequency (%) |
| factory | 494467 | |
| insulated | 494467 | |
| loose | 197854 | |
| jacket | 197854 | |
| none | 91934 | 6.2% |
Most occurring characters
| Value | Count | Frequency (%) |
| t | 1186788 | 10.6% |
| a | 1186788 | 10.6% |
| o | 982109 | 8.8% |
| e | 982109 | 8.8% |
| c | 692321 | 6.2% |
| 692321 | 6.2% | |
| s | 692321 | 6.2% |
| n | 586401 | 5.3% |
| u | 494467 | 4.4% |
| d | 494467 | 4.4% |
| Other values (9) | 3157831 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 8979026 | |
| Uppercase Letter | 1476576 | 13.2% |
| Space Separator | 692321 | 6.2% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| t | 1186788 | |
| a | 1186788 | |
| o | 982109 | |
| e | 982109 | |
| c | 692321 | |
| s | 692321 | |
| n | 586401 | |
| u | 494467 | |
| d | 494467 | |
| l | 494467 | |
| Other values (3) | 1186788 |
Uppercase Letter
| Value | Count | Frequency (%) |
| F | 494467 | |
| I | 494467 | |
| L | 197854 | |
| J | 197854 | |
| N | 91934 | 6.2% |
Space Separator
| Value | Count | Frequency (%) |
| 692321 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 10455602 | |
| Common | 692321 | 6.2% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| t | 1186788 | |
| a | 1186788 | |
| o | 982109 | 9.4% |
| e | 982109 | 9.4% |
| c | 692321 | 6.6% |
| s | 692321 | 6.6% |
| n | 586401 | 5.6% |
| u | 494467 | 4.7% |
| d | 494467 | 4.7% |
| l | 494467 | 4.7% |
| Other values (8) | 2663364 |
Common
| Value | Count | Frequency (%) |
| 692321 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 11147923 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| t | 1186788 | 10.6% |
| a | 1186788 | 10.6% |
| o | 982109 | 8.8% |
| e | 982109 | 8.8% |
| c | 692321 | 6.2% |
| 692321 | 6.2% | |
| s | 692321 | 6.2% |
| n | 586401 | 5.3% |
| u | 494467 | 4.4% |
| d | 494467 | 4.4% |
| Other values (9) | 3157831 |
| Distinct | 183 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 221093 |
| Missing (%) | 22.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 31.59133146 |
| Minimum | 0 |
|---|---|
| Maximum | 1872 |
| Zeros | 122900 |
| Zeros (%) | 12.2% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.7 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 25 |
| median | 30 |
| Q3 | 40 |
| 95-th percentile | 80 |
| Maximum | 1872 |
| Range | 1872 |
| Interquartile range (IQR) | 15 |
Descriptive statistics
| Standard deviation | 22.94257629 |
|---|---|
| Coefficient of variation (CV) | 0.7262301155 |
| Kurtosis | 1133.880446 |
| Mean | 31.59133146 |
| Median Absolute Deviation (MAD) | 10 |
| Skewness | 15.44642207 |
| Sum | 24775659.65 |
| Variance | 526.361807 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 30 | 147632 | |
| 0 | 122900 | |
| 25 | 117613 | |
| 50 | 105097 | |
| 35 | 92410 | |
| 40 | 62376 | 6.2% |
| 20 | 46638 | 4.6% |
| 80 | 34369 | 3.4% |
| 60 | 14115 | 1.4% |
| 15 | 7712 | 0.8% |
| Other values (173) | 33393 | 3.3% |
| (Missing) | 221093 |
| Value | Count | Frequency (%) |
| 0 | 122900 | |
| 1 | 190 | < 0.1% |
| 1.752 | 1 | < 0.1% |
| 1.79 | 1 | < 0.1% |
| 1.89 | 2 | < 0.1% |
| 1.91 | 5 | < 0.1% |
| 1.92 | 1 | < 0.1% |
| 2 | 62 | < 0.1% |
| 2.33 | 1 | < 0.1% |
| 2.35 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 1872 | 21 | |
| 890 | 1 | < 0.1% |
| 870 | 1 | < 0.1% |
| 801 | 1 | < 0.1% |
| 800 | 1 | < 0.1% |
| 670 | 2 | < 0.1% |
| 660 | 1 | < 0.1% |
| 600 | 4 | < 0.1% |
| 580 | 3 | < 0.1% |
| 560 | 1 | < 0.1% |
TotalDeliveredEnergy
Real number (ℝ)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONMISSINGSKEWED| Distinct | 431407 |
|---|---|
| Distinct (%) | 99.2% |
| Missing | 570448 |
| Missing (%) | 56.7% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 24403.69201 |
| Minimum | -3929.793 |
|---|---|
| Maximum | 5431169.676 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 3 |
| Negative (%) | < 0.1% |
| Memory size | 7.7 MiB |
Quantile statistics
| Minimum | -3929.793 |
|---|---|
| 5-th percentile | 8305.22705 |
| Q1 | 15190.84325 |
| median | 21318.4585 |
| Q3 | 29593.121 |
| 95-th percentile | 49214.57105 |
| Maximum | 5431169.676 |
| Range | 5435099.469 |
| Interquartile range (IQR) | 14402.27775 |
Descriptive statistics
| Standard deviation | 23488.45498 |
|---|---|
| Coefficient of variation (CV) | 0.9624959602 |
| Kurtosis | 14527.40373 |
| Mean | 24403.69201 |
| Median Absolute Deviation (MAD) | 6932.2075 |
| Skewness | 87.3140669 |
| Sum | 1.061316566 × 1010 |
| Variance | 551707517.1 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 184.697 | 22 | < 0.1% |
| 641.717 | 16 | < 0.1% |
| 184.877 | 14 | < 0.1% |
| 541.268 | 12 | < 0.1% |
| 919.127 | 12 | < 0.1% |
| 477.232 | 12 | < 0.1% |
| 292.044 | 11 | < 0.1% |
| 99.907 | 11 | < 0.1% |
| 992.339 | 10 | < 0.1% |
| 368.215 | 9 | < 0.1% |
| Other values (431397) | 434771 | |
| (Missing) | 570448 |
| Value | Count | Frequency (%) |
| -3929.793 | 1 | |
| -2843.478 | 1 | |
| -1805.047 | 1 | |
| 50.563 | 1 | |
| 56.181 | 1 | |
| 69.853 | 1 | |
| 72.877 | 1 | |
| 73.373 | 1 | |
| 74.998 | 1 | |
| 77.344 | 1 |
| Value | Count | Frequency (%) |
| 5431169.676 | 1 | |
| 4129347.104 | 1 | |
| 3846582.646 | 1 | |
| 3444097.206 | 1 | |
| 3343274.743 | 1 | |
| 3133044.042 | 1 | |
| 2868451.102 | 1 | |
| 2844373.55 | 1 | |
| 2370104.193 | 1 | |
| 2050664.845 | 1 |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.First rows
| CountyName | DwellingTypeDescr | YearofConstruction | EnergyRating | BerRating | GroundFloorArea(sq m) | CO2Rating | MainSpaceHeatingFuel | MainWaterHeatingFuel | VentilationMethod | StructureType | NoOfSidesSheltered | InsulationType | InsulationThickness | TotalDeliveredEnergy | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Donegal | Detached house | 1997 | C2 | 180.01 | 171.19 | 45.53 | Heating Oil | Heating Oil | Natural vent. | Masonry | 1.00 | Factory Insulated | 20.00 | 25474.52 |
| 1 | Kildare | Detached house | 2010 | B3 | 137.56 | 242.93 | 35.66 | Heating Oil | Heating Oil | Natural vent. | Masonry | 2.00 | Factory Insulated | 50.00 | 27654.47 |
| 2 | Dublin | Semi-detached house | 1999 | C3 | 223.61 | 99.38 | 44.65 | Mains Gas | Mains Gas | Natural vent. | Masonry | 3.00 | Loose Jacket | 20.00 | 17000.04 |
| 3 | Dublin | Semi-detached house | 1965 | C2 | 196.99 | 138.41 | 37.83 | Mains Gas | Mains Gas | Natural vent. | Masonry | 2.00 | NaN | NaN | 22708.48 |
| 4 | Dublin | Semi-detached house | 1985 | D2 | 260.52 | 127.16 | 55.07 | Mains Gas | Mains Gas | Natural vent. | Masonry | 2.00 | Loose Jacket | 100.00 | 28182.86 |
| 5 | Donegal | House | 1975 | D1 | 248.00 | 88.57 | 62.68 | Heating Oil | Heating Oil | Natural vent. | Masonry | 2.00 | Factory Insulated | 0.00 | 18470.03 |
| 6 | Dublin | Semi-detached house | 1985 | D2 | 275.97 | 73.54 | 58.21 | Mains Gas | Mains Gas | Natural vent. | Masonry | 2.00 | Loose Jacket | 100.00 | 17227.86 |
| 7 | Limerick | Semi-detached house | 1960 | D1 | 244.71 | 89.54 | 59.79 | Heating Oil | Heating Oil | Natural vent. | Masonry | 3.00 | Loose Jacket | 80.00 | 16711.95 |
| 8 | Kerry | House | 1973 | D2 | 293.82 | 157.62 | 71.30 | Heating Oil | Heating Oil | Natural vent. | Masonry | 1.00 | Loose Jacket | 50.00 | 40212.93 |
| 9 | Kilkenny | Detached house | 1980 | D2 | 299.96 | 91.44 | 75.47 | Heating Oil | Heating Oil | Natural vent. | Masonry | 1.00 | Loose Jacket | 20.00 | 21839.38 |
Last rows
| CountyName | DwellingTypeDescr | YearofConstruction | EnergyRating | BerRating | GroundFloorArea(sq m) | CO2Rating | MainSpaceHeatingFuel | MainWaterHeatingFuel | VentilationMethod | StructureType | NoOfSidesSheltered | InsulationType | InsulationThickness | TotalDeliveredEnergy | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1005338 | Dublin | Top-floor apartment | 2022 | A2 | 46.90 | 77.00 | 8.82 | NaN | NaN | Bal.whole mech.vent heat recvr | Masonry | 2.00 | NaN | NaN | NaN |
| 1005339 | Dublin | Ground-floor apartment | 2004 | D2 | 281.34 | 73.50 | 55.32 | Electricity | Electricity | Natural vent. | Masonry | 3.00 | NaN | NaN | NaN |
| 1005340 | Dublin | Mid-floor apartment | 2020 | A3 | 55.23 | 37.80 | 10.86 | Electricity | Electricity | NaN | NaN | NaN | NaN | NaN | NaN |
| 1005341 | Dublin | Mid-floor apartment | 2020 | A2 | 46.32 | 50.89 | 9.11 | Electricity | Electricity | NaN | NaN | NaN | NaN | NaN | NaN |
| 1005342 | Dublin | Mid-floor apartment | 2020 | A2 | 38.60 | 86.58 | 7.59 | Electricity | Electricity | NaN | NaN | NaN | NaN | NaN | NaN |
| 1005343 | Donegal | Detached house | 1982 | D2 | 282.58 | 214.18 | 70.89 | Heating Oil | Heating Oil | Natural vent. | Masonry | 1.00 | NaN | NaN | 52927.53 |
| 1005344 | Dublin | Mid-terrace house | 1900 | G | 998.14 | 99.77 | 317.99 | Manufactured Smokeless Fuel | Electricity | Natural vent. | Masonry | 4.00 | NaN | NaN | NaN |
| 1005345 | Dublin | Mid-floor apartment | 2021 | A2 | 37.26 | 81.32 | 7.33 | NaN | NaN | Whole house extract vent. | Masonry | 2.00 | NaN | NaN | NaN |
| 1005346 | Dublin | Mid-floor apartment | 2022 | A2 | 36.05 | 82.09 | 6.76 | NaN | NaN | Bal.whole mech.vent heat recvr | Masonry | 4.00 | NaN | NaN | NaN |
| 1005347 | Monaghan | Detached house | 2013 | B1 | 90.57 | 334.35 | 19.66 | Electricity | Electricity | Bal.whole mech.vent heat recvr | Timber or Steel Frame | 1.00 | NaN | NaN | NaN |